

## **IN THE CLAIMS**

Please amend the claims as follows:

(currently amended) A programmable processor comprising:
a data path;

an external interface operable to receive data from an external source and communicate the received data over the data path;

a register file containing a plurality of registers each having a register width, the register file coupled to the data path and operable to support processing of a plurality of threads;

an execution unit coupled to the data path, the execution unit operable to execute a plurality of instruction streams from the plurality of threads, each instruction stream including a single instruction that operates on specifies an operation, the operation to be performed on each one of a plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result, each of the data elements having an elemental width smaller than the register width.

- 2. (original) The processor of claim 1 wherein the execution unit comprises a pipeline having a plurality of stages and wherein the pipeline interleaves execution of instructions from the plurality of instruction streams.
- 3. (original) The processor of claim 2 wherein the pipeline is operable to simultaneously contain states of execution of at least two instructions from different instruction streams.
- 4. (original) The processor of claim 2 wherein execution of the instructions is interleaved in a round-robin manner.
- 5. (currently amended) The processor of claim 1 wherein the processor ensures only one thread from the plurality of threads can have handle an exception handled at any given time.

6. (original) The processor of claim 1 further comprising a virtual memory addressing unit and a cache operable to store data communicated between the external interface and the data path.

- 7. (currently amended) The processor of claim 1 wherein the execution unit is further operable to, in response to decoding a second single instruction specifying a <u>first third</u> and a <u>second fourth</u> register each containing a plurality of operands, multiply the plurality of floating point operands in the <u>first third</u> register by the plurality of operands in the <u>second fourth</u> register to produce a plurality of products and provide the plurality of products to partitioned fields of a result register as a second catenated result.
  - 8. (currently amended) A programmable processor comprising: a data path;

an external interface operable to receive data from an external source and communicate the received data over the data path;

first and second register files containing a plurality of registers each having a register width, the first and second register files coupled to the data path and operable to support processing of first and second threads, respectively;

an execution unit coupled to the data path, the execution unit operable to execute first and second instruction streams from the first and second threads, respectively, the first and second instruction streams each including a single instruction that operates on specifies an operation, the operation to be performed on each one of a plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result, each of the data elements having an elemental width smaller than the register width.

9. (original) The processor of claim 8 wherein the execution unit comprises a pipeline having a plurality of stages and wherein the pipeline interleaves execution of instructions from the first instruction stream with instructions from the second instruction stream.

10. (original) The processor of claim 9 wherein the pipeline is operable to simultaneously contain states of execution of an instruction from the first instruction stream and an instruction from the second instruction stream.

11. (original) The processor of claim 9 wherein execution of the instructions is interleaved in a round-robin manner.

12. (currently amended) The processor of claim 9 wherein the execution unit is further operable to, in response to decoding a second single instruction specifying a <u>first third</u> and a <u>second fourth</u> register each containing a plurality of operands, multiply the plurality of floating point operands in the <u>first third</u> register by the plurality of operands in the <u>second fourth</u> register to produce a plurality of products and provide the plurality of products to partitioned fields of a result register as a second catenated result.

- 13. (currently amended) A data processing system comprising:
- (a) a bus coupling components in the data processing system;
- (b) an external memory coupled to the bus;
- (c) a programmable microprocessor coupled to the bus and capable of operation independent of another host processor, the microprocessor comprising:

a data path;

an external interface operable to receive data from an external source and communicate the received data over the data path;

a register file containing a plurality of registers each having a register width, the register file coupled to the data path and operable to support processing of a plurality of threads;

an execution unit coupled to the data path, the execution unit operable to execute a plurality of instruction streams from the plurality of threads, each instruction stream including a single instruction that operates on specifies an operation, the operation to be performed on each one of a plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result, each of the data elements having an elemental width smaller than the register width.

- 14. (original) The system of claim 13 wherein the execution unit comprises a pipeline having a plurality of stages and wherein the pipeline interleaves execution of instructions from the plurality of instruction streams.
- 15. (original) The system of claim 14 wherein the pipeline is operable to simultaneously contain states of execution of at least two instructions from different instruction streams.
- 16. (original) The system of claim 14 wherein execution of the instructions is interleaved in a round-robin manner.
- 17. (currently amended) The system of claim 13 wherein the processor ensures only one thread from the plurality of threads can have handle an exception handled at any given time.
- 18. (original) The system of claim 13 further comprising a virtual memory addressing unit and a cache operable to store data communicated between the external interface and the data path.
- 19. (currently amended) The system of claim 13 wherein the execution unit is further operable to, in response to decoding a second single instruction specifying a <u>first</u> third and a <u>second</u> fourth register each containing a plurality of operands, multiply the plurality of floating

point operands in the <u>first</u> third register by the plurality of operands in the <u>second</u> fourth register to produce a plurality of products and provide the plurality of products to partitioned fields of a result register as a second catenated result.

- 20. (currently amended) A data processing system comprising:
- (a) a bus coupling components in the data processing system;
- (b) an external memory coupled to the bus;
- (c) a programmable microprocessor coupled to the bus and capable of operation independent of another host processor, the microprocessor comprising:

a data path;

an external interface operable to receive data from an external source and communicate the received data over the data path;

first and second register files containing a plurality of registers each having a register width, the first and second register files coupled to the data path and operable to support processing of first and second threads, respectively;

an execution unit coupled to the data path, the execution unit operable to execute first and second instruction streams from the first and second threads, respectively, the first and second instruction streams each including a single instruction that operates on specifies an operation, the operation to be performed on each one of a plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result, each of the data elements having an elemental width smaller than the register width.

21. (original) The system of claim 20 wherein the execution unit comprises a pipeline having a plurality of stages and wherein the pipeline interleaves execution of instructions from the first instruction stream with instructions from the second instruction stream.

22. (original) The system of claim 21 wherein the pipeline is operable to simultaneously contain states of execution of an instruction from the first instruction stream and an instruction from the second instruction stream.

23. (original) The system of claim 21 wherein execution of the instructions is interleaved in a round-robin manner.

24. (currently amended) The system of claim 21 wherein the execution unit is further operable to, in response to decoding a second single instruction specifying a <u>first third</u> and a <u>second fourth</u> register each containing a plurality of operands, multiply the plurality of floating point operands in the <u>first third</u> register by the plurality of operands in the <u>second fourth</u> register to produce a plurality of products and provide the plurality of products to partitioned fields of a result register as a second catenated result.